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Abstract 

We demonstrate that substantial progress can be achieved in the study of the phase struc- 
' ture of 4-dimensional compact QED by a joint use of hybrid Monte Carlo and multicanonical 

, algorithms, through an efficient parallel implementation. This is borne out by the observa- 

tion of considerable speedup of tunnelling between the metastable states, close to the phase 
Q\ ' transition, on the Wilson line. We estimate that the creation of adequate samples (with order 

f*"*) , 100 flip-flops) becomes a matter of half a year's runtime at 2 Gflops sustained performance 

OO ■ for lattices of size up to 24 4 . 

On 

1 Introduction 

It appears exceedingly intriguing to define variants of QED by studying vacuum states other than 
the usual perturbative vacuum. Lattice techniques have the potential power to deal with this 
situation, whenever they provide us with phase transition points of second and higher orders. 

It is embarrassing that lattice simulations of compact QED still have not succeeded to clarify 
the order of the phase transition near j3 = 1, the existence of which was established in the clas- 
sical paper of Guth jjj. This is mainly due to the failure of standard updating algorithms, like 
metropolis, heatbath or metropolis with reflections (2[ ||, ||, |), to move the system at sufficient 
rate between the observed metastable states near its phase transition. The tunneling rates de- 
crease exponentially in L 3 and exclude the use of lattices large enough to make contact with the 
thermodynamic limit by finite size scaling techniques (FSS) 

In this paper, we propose to make use of the multicanonical (MUCA) algorithm within the 
hybrid Monte Carlo (HMC) updating scheme [|| in order to boost the tunneling rates. Since both 
algorithms are inherently of global nature, their combination will facilitate the parallelization of 
MUCA which could not be achieved otherwise. 

In the early days of simulations on the hyper-torus, the U(l) phase transition was claimed to 
be second-order ||, |l^, [ll], |l2), however, with increasing lattice sizes, metastabilities and double 
peak action distributions became manifest, strongly hinting at its first-order character |13, 14|, fi"5"| . 
This picture is in accord with various renormalization group studies (if], [lTj . 

However, the latent heat is found to decrease with the lattice size and the critical exponent v 
is neither 0.25 (first order) nor 0.5 (trivially second-order) [jl8|. These facts allow for two possible 
propositions: 

1. the double peak structure is a finite size effect and might vanish in the thermodynamic limit, 
leading to the signature of a second-order phase transition; 

2. the phase transition is weakly first-order, i.e., the correlation length £ is finite, but large in 
terms of the available lattice extent L; this would fake, on small lattices, the signature of a 
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second-order transition, and a stabilized value of the latent heat would only become visible 
in the thermodynamic limit L > £ . 

In search for a lattice formulation of QED with a second-order transition point, the action was 
generalized to include a piece in the adjoint representation with coupling 7 [ ^9| , in the expectation 
that the phase transition would be driven towards second-order, at sufficiently small negative val- 
ues of 7. However, simulations on the hyper-torus, up to 7 = —0.4, revealed the reappearance of a 
double peak on large enough lattices [|20|, Thus, the hypothesis of a second-order phase transi- 
tion at some finite negative value of 7 |j22|, |23], pi| is again doubted; furthermore, renormalization 
group investigations indicated that the second-order phase transition is located at 7 — > —00 [0. 

Ref. has speculated that the mechanism behind the lattice heuristics of metastabilities 
is driven by monopole loops that wrap around the hyper-torus. According to this scenario, the 
inefficiency of local updating algorithms to create and annihilate such monopole constellations 
causes their slowing down, in agreement with the earlier proposition (1). Results were presented 
in support of this view by switching to spherical lattices with trivial homotopy group where such 
wrapping loops are no more topologically stabilized [^6[ |?], ^8|, |29| . But on spherical lattices 
equivalent to L = 26 at 7 = —0.2, double peak structures have recently been reported to reappear 
pfj| , [2lf| , corroborating earlier observations with periodic boundary conditions at 7 = 0: the 
suppression of monopole loop penetration through the lattice surface turned out to be incapable 
of preventing the incriminated double peak signal to show up on large lattices, to say L = 32 

It appears that a clarification of the situation on the Wilson line is mandatory for further 
progress in the understanding of compact lattice QED! This challenge requires the design of more 
powerful updating algorithms. A promising method is based on simulated tempering |33, 34, 



enlarging the Lagrangian by a monopole term whose coupling is treated as an additional dynamical 
variable. Multi-scale update schemes in principle can alleviate critical slowing down (CSD) which 
is associated with the increase of the correlation length £ (as measured in a non-mixed phase) near 
the critical coupling, (3 C p6[ . However, the exponential supercritical slowing down (SCSD) which 
is a consequence of the surface tension at first-order phase transitions cannot be overcome by such 
type of scale-adapted methods. In that instance, one expects the autocorrelation times to grow 
exponentially with the system size due to the occurrence of two 3-dimensional interfaces, leading 
to 

t scsd oc exp (2ctL 3 ) . (1) 

Torrie and Valleau |37|, |3f| have shown how to generate arbitrary "non-physical" sam- 
pling distributions. Their method, termed "umbrella sampling", has been introduced to span large 
regions of phase diagrams. The method is capable to improve the efficiency of stochastic sampling 
for situations when dynamically nearly disconnected parts of phase space occur by biassing the 
system to frequent the dynamically depleted, connecting regions of configuration space. They 
interpreted their method as sampling "a whole range of temperatures" Q . 

In recent years the idea of "umbrella sampling" has been popularized and extensively applied 
under the name "multi-canonical algorithm" (MUCA) by Berg and Neuhau s U, ||, f42], pi, gl to 
the simulation of a variety of systems exhibiting first-order phase transitions ]45, 46 i7[ [48|, 4ij| . 50| 



In this procedure, the biassing weight w(S) of a configuration with action S is dynamically adjusted 
(bootstrapped) such as to achieve a near-constant overall frequency distribution over a wide range 
of S within a single simulation. 

Obviously, MUCA in principle offers a powerful handle to deal with SCSD. It remains then a 
practical question whether one can indeed proceed to large lattices by boosting tunneling rates from 
the SCSD behaviour (eq. (Q)) to the peak efficiency of local Monte Carlo methods (characterized by 
0((L A ) 2 ) complexity). This leads us to the key point of this paper: it is a severe shortcoming of the 
multi-canonical algorithm that its implementation is seemingly restricted to sequential computers, 
as it requires knowledge of the global action, even during local updating. We will show that HMC 
is from the very outset able to implement MUCA in a parallel manner. 
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In sections 2.1 and 2.2, we will give a short review of the MUCA and the HMC algorithms. In 
section 2.3, we merge MUCA with HMC. From our ongoing simulation project of U(l) theory on 
the Wilson line pl[ , we determine the tunneling efficiency compared to the standard metropolis 
algorithm which in our case is complemented by three reflection steps. In section [| we shall 
present our results for lattice sizes up to 16 4 and predict the tunnelling rates for lattice sizes up 
to 24 4 , as would be required for a proper FSS. 



2 Multicanonical Hybrid Monte Carlo 

The hybrid Monte Carlo (HMC) algorithm fl ^3) produces a global trial configuration by car- 
rying out a molecular dynamics evolution of the field configuration very close to the surface of con- 
stant action. Subsequently, a Monte Carlo decision is imposed which is based on the global action 
difference AS, being small enough to be frequently accepted. Within HMC, all degrees-of-freedom 
can be changed simultaneously and hence in parallel. This then provides the straightforward path 
to implement MUCA as part of HMC on parallel machines^]: one just uses the values of the global 
action, as provided by HMC, to compute the bias function w(S) for the MUCA algorithm. 



2.1 Multicanonical Algorithm 

"Canonical" Monte Carlo generates a sample of field configurations, {0}, within a Markov process, 
according to the Boltzmann weight, 

Pf,{S) = -±-e-f"M, (2) 

which follows from maximizing the entropy with respect to all possible probability distributions 
P[4>]. The partition function Z normalizes the total probability to 1, 

Z P = Adfle-^M. (3) 



S is the action (the energy in the case of statistical mechanics) and the coupling (or inverse 
temperature ™). 

The canonical action density which in general exhibits a double peak structure at a first-order 
phase transition, can be rewritten as 

NUS,(3)=p(S)e-P s , (4) 

with the spectral density p(S) being independent of (3. Usually, J N c ^ n (S, (3) dS is set to 1. 
The multicanonical approach aims at generating a flat action density 

N MUCA (S, 0) = const., for S min < S < S m „, (5) 

in a range of 5* that covers the double peaks at the first-order phase transition. 

Therefore, instead of sampling canonically according to e~@ s , one modifies the sampling by a 
weight factor IUmuca: 



W MVCA (S,p) = { P(5 j| e? 3 for S min < S < S^ (6) 

which is constant outside the relevant action range. Such 'corrigez la fortune' is equivalent to a 
net sampling according to the yet unknown probability distribution 

w{S) = for S mi » <S< S„ x . (7) 

1 A first attempt in this direction has been made in Ref. f^ ] in the framework of the Higgs- Yukawa model. 
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Since W(S, (3) is unknown at the begin of the simulation, it is instrumental for MUCA to follow 
Miinchhausen and bootstrap from good guesstimates p^ j. We shall do so by starting from an 
observed histogram of the canonical action density, A r can (S', /3 C ), see eq. (|J), at the supposed location 
of the phase transition^], /3 C . From the action density, we compute W muCA (S, (3 C ) according to 
eq. (||). The sampling then proceeds with the full MUCA weight, 

■•MUCA 

oc e -M(S))s-a(«), (8 ) 

This latter formulation can be interpreted as a simulation proceeding at varying couplings (tem- 
peratures), hence the naming ' multicanonicaF . 

In order to compute expectation values of observables O, one has to reweight the resulting 
action density at the end of the day by the factor W MUCA (S, f3 c ), which reconstitutes the proper 
canonical density: 



^ ^ _ Pc W-MUCA(Si,(3 a ) ^ 

Ez 



^MUCA(S„ft) 

Additionally, (Op J simulated at (3 C can be reweighted to any desired (3 (following ^5|), given that 
the corresponding region of phase space has been covered by the MUCA simulation. We emphasize 
that eq. (^) is only useful complemented by a proper error analysis. The canonical error computed 
from the multicanonical ensemble has been elaborated in Ref. pq |. 

Note that there are many possible choices for the form of the multicanonical weight. Just for 
technical reasons we require it to be continuous in S. One can either guess an analytic function, or 
choose a polygonal approximation such as given in eq. (^|). In this case, the multicanonical weight 
is expressed in terms of the functions (3(S) and a(S) which are actually characteristic functions 
of the bins. (3(S) can be considered as an effective temperature p7| . 

The computation of the weights requires the knowledge of the global and not just the local 
change in action for each single update step. For this reason, even for a local action, one cannot 
perform local updating moves in parallel, such as the well-known checkerboard pattern. As a 
consequence, MUCA is not parallelizable for local update algorithms. For remedy, we propose 
here to go global and utilize the HMC updating procedure. 

2.2 Hybrid Monte Carlo 

The HMC consists of two parts: the hybrid molecular dynamics algorithm (HMD) evolves the 
degrees of freedom by means of molecular dynamics (MD) which is followed by a global Metropolis 
decision to render the algorithm exact. 

In addition to the gauge fields M (x) one introduces a set of statistically independent canonical 
momenta 7r^(x), chosen at random according to a Gaussian distribution exp(— ^). The action 
S[4>] is extended to a guidance Hamiltonian 

Starting with a configuration (cA, it) at MD time t = 0, the system moves through phase space 
according to the equations of motion 

, dU 

d-Kfj, 



2 Hatted quantities refer to stochastic estimates. 
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leading to a proposal configuration (</>', 7T') at time t = r. Finally this proposal is accepted in a 
global Metropolis step with probability 

P acc = min(l, e - AW ), with AH = H[<f>', n r \ -H[(f>,ir]. (12) 

The equations of motion are integrated numerically with finite step size At along the trajectory 
from t = up to t = iV md . Using the leap-frog scheme as symplectic integrator the discretized 
version of eq. (|il|) reads: 



ir n+1 



At 2 d 
At d / r ,„,\ At d 



"" f A/ ' ;T " ~ir^( wn] ) 



(w])-Y^( wn+1] )- (13) 



Here we have presented the scheme with both the momenta and the gauge fields defined at full 
time steps^j, t = nAt. 

To ensure that the Markov chain of gauge field configurations reaches a unique fixed point 
distribution exp(— S[tfi]) one must require the updating procedure to fulfil detailed balance, which 
is guaranteed by the iterative map of eq. (O) / : (tfi, n) — > (<fi' , it') being 



• time reversible: f (</>', —w') = ((f), ~~tt) 

• and measure preserving: [d<j>'\ [dn 1 ] = [ti(^][d7r]. 

It is easy to prove that these two conditions also hold for the multicanonical action. Note that 
the guidance Hamiltonian, eq. ([lO]), defining the MD may differ from the acceptance Hamiltonian 
in eq. (|l2|), which produces the equilibrium distribution proper. In the following, we shall exploit 
this freedom to develop two variant mergers of MUCA and HMC. 

2.3 Merging MUCA and HMC for Compact QED (MHMC) 

We consider a multicanonical HMC for pure 4-dimensional U(l) gauge theory with standard 
Wilson action defined as 

S 'M= E [l-cos^a;))], (14) 

where 

Onu(x) = <A/i(ar) + 4>v(x + A) + <t>v{x) 

is the sum of link angles that contribute to one of six plaquettes interacting with the link angle 

Eq. U suggests to consider an effective action S including the "multicanonical potential" Vm UC a 

S = (] c S + V MVCA (Sj c ), (15) 

with 

V mcA (Sj c ) = tog( 1 ) 

V W MUCA (b,p c ) / 

= (3(S)S + a(S). (16) 
There are two natural options to proceed from here: 



3 Note that the actual implementation computes the momenta at half time steps according to the sequence 

7r(i + Ai/2) = 7r(t - At/2) - At— (/3S[0(t)]), 

d<f> 

<j>{t + At) = <j>{t) + At ■ n(t + At/2), 
initialized and finished by a half-step in ?r H. Each sequence approximates the correct H with an error of 0(At :i ). 



5 



Method 1 performs molecular dynamics using the canonical guidance Hamiltonian 

Hmd = 2 X/ 71-2 + ^ c ^' 

with standard action S. The resulting gluonic force is given by 

F(x) = TT^(x) 

= j3 c [sinfl^as — 0) — sm6^ v {x) . (17) 
Method 2 makes use of the multicanonical potential as a driving term within the Hamiltonian, 

Hmd, = 2 71-2 + S> 

inducing an additional drift term 

ii^x) = F{x) - -^j^V UVCA {s[<f>{x)],0 c ) 

= {p e + hS))Y^ [sin^ w (a;-2>)-sin^(x)], (18) 

with the effective /3 as defined in eq. (j|). 
For both options, the Hamiltonian governing the accept/reject decision, eq. (|l2|), reads equally: 

The latter method is governed by the dynamics underlying the very two peak structure: as one 
can see from Fig. [| Vmuca is repelling the system out of the hot (cold) phase towards the cold 
(hot) phase, thus increasing its mobility and enhancing flip-flop activity. 

We comment that the implementation of method 2 requires the computation of the global 
action (to adjust the correct multicanonical weight, eq. (g)) at each integration step along the 
trajectory of molecular dynamics to guarantee reversibility. In the polygon approximation, this 
amounts to a determination of the effective coupling (3 at each time step in MD. Note that a 
does not influence MD but enters into the global Metropolis decision, eq. (12]). Method 1 is much 
simpler, running at fixed trial coupling [3 C and avoiding the effort of computing N md — 2 global sums 
while travelling along the trajectories. It turned out that of both versions of MHMC, method 2 
performs better than method 1, when autocorrelation and difference in computational effort are 
taken into account. Thus, we continue our analysis by investigation of method 2. 

3 Results 

In order to evaluate the efficiency of the MHMC and to set the stage for a proper extrapolation, we 
generated time series of the U(l) action on lattices of size 6 4 up to 16 4 . The runs are summarized 
in Tab. [I]. We applied method 2 as being the more promising one on real large lattices. 

To arrive at an action density N muCA (S, f3) which is approximately flat in the desired region 
between the two peaks it is crucial to find a good estimate N c ^ a (S, f3 c ) of the canonical action 
density. However, it becomes more and more delicate for large volumes to find the proper multi- 
canonical weight, -Pmuca('S') ( ec l- (P))- shows the evolution of the action density as the lattice 
size increases. For lattices < 16 it was sufficient to perform a short canonical run to generate 
an action density N MUCA suitable to compute a proper weight factor Wmuca- On the 16 4 system, 
however, the resulting multicanonical distribution becomes quite sensitive to the choice of the 
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1.010800 


1.210.000 




0.045 
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Table 1: Total numbers of sweeps carried out both for MRS and MHMC at different lattice sizes 
L and couplings j3. 

0.025 
0.02 
_ 0.015 

CO 

<z 0.01 
0.005 


0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 
s 

Figure 1: The canonical action densities, eq. (|4|), on 6 4 up to 16 4 lattices, reweighted to their 
respective j3 c values, here defined via equal height of the histograms. 




weight. Therefore, in the case of large lattices (L > 16) we cannot rely on canonical simulations to 
start with. Even if we perform O(10 6 ) sweeps using standard Metropolis update with 3 reflection 
steps^, SCSD prevents a sufficiently accurate determination of the phase weight. Therefore, we 
install a recursive procedure: from a previous guess Ni(S, fl z c ) we go through MHMC and arrive at 
N i+1 (S, (3 l c +1 ). This computational scheme is initialized by a standard canonical "short run". We 
found that one such learning cycle is sufficient. Fig. || illustrates the evolution of the multicanonical 
action density on the 16 4 lattice. 

On larger volumes the determination of a good guess can be considerably boosted by a crank-up 
extrapolation that starts from smaller systems [|]|. In Fig. ||, we display the quality of "flatness" 
of N M vca(S, (3) achieved in our investigations for the various lattice sizes. 

3.1 Tunneling Behaviour 

With our estimate for Nm UCA {S) at /3 = 1.010753, depicted in Fig. |^, we have generated the time 
history of the action per site, s = S/6V, as shown in the upper part of Fig. ^. For reference, we have 
included the time history from the MRS algorithm on the same lattice. The figure demonstrates 

4 The metropolis algorithm with reflection steps (MRS) is considered as a very effective local update algorithm 
for U(l) §. 
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0.012 




Figure 2: Evolution of multicanonical action density. Starting from a canonical run a first MHMC 
simulation is performed (dashed line), and, based on this result, the final run (solid line) is carried 
out. Both curves are computed on the 16 4 lattice at j3 = 1.0f0753. 
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Figure 3: N MVCA (S, (3) at run-parameter (3 as given in Tab. 



the success of MHMC: the method provides us with a gain factor in tunneling rate of about one 
order of magnitude on the 16 4 lattice. 

In order to quantify this achievement, we introduce the average flip time, r tlip , a quantity that 
is readily measurable. r tlip is defined as follows: we histogram the time series of s using N bins, 
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Figure 4: Time history of the 16 4 system at (3 = 1.010753 for MHMC (top) and MRS (bottom). 

as illustrated in Fig. ^. A suitable number is N = 8. The total binning range is adjusted such 
0.025 |— — . 1 1 . 1 1 . 1 




0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 

> 2 > 3 '4 > 5 'e 'r 'a s 

Figure 5: Action density of the 16 4 system at /3 = 1.010753 as a function of s — from which 
Vmuca is derived (the dashed line shows exp(V r MUCA )). The subdivision of the support of the action 
density into 8 intervals is introduced in order to define r flip . 

that 99.9 % of the events are covered symmetrically by the 8 bins. A flip (flop) is given when the 
system travels from Ig to I\ (and vice versa). r flip is defined as the inverse number of the sum 
of flips and flops multiplied by the total number of trajectories. In Tab. T mp is given for the 
various lattices. The error in r flip has been computed by a jackknife error analysis. 

3.2 Scaling Behaviour 

With the results for T nip on lattices up to 16 4 we are in the position to estimate the scaling 
behaviour of MHMC in comparison to standard MRS updates. Fig. ^ shows r flip both for MHMC 
and MRS as a function of the lattice size L at their respective /3-values, as listed in Tab. ||. 
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—MRS 
'flip 


T MHMC 
'flip 


6 


1.001600 


508(12) 


650(20) 


8 


1.007370 


1023(60) 


1173(50) 


10 


1.009300 


2474(117) 


2006(84) 


12 


1.010143 


5470(770) 


3260(440) 


14 


1.010668 


16400(3300) 


5090(630) 


16 


1.010753 


44800(9700) 


6350(860) 



Table 2: r flip for MRS and MHMC measured at the simulated /3's. 
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1e+07 
1e+06 
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L 

Figure 6: Tunneling times for MRS (exponential fit) and MHMC (lower convex curve is power 
law, upper convex curve is exponential fit). The errors of the two exponential fits arc depicted as 
dotted lines. The error of the power law fit, eq. (p2|), is not visible on this scale. 

According to the expected exponential behaviour of t" rs which, in the asymptotic regime 
L — > oo, should be given by eq. (|l|), we perform a x 2 ~fit with the ansatz: 

r; ES =aL k e clJ . (20) 

It yields the following parameter values: 

a = 11.9(3.7) 
b = 2.01(18) 

c = 6.7(8) 10~ 4 , (21) 

with Xpcrdof = 0.897. As a result, we find a clear exponential SCSD behaviour for the MRS 
algorihmpl 

On the other hand, for the tunneling times of the MHMC, we expect a monomial dependence 
in L: 

T™™ C =PL\ (22) 

J One is tempted to extract the interfacial surface tension a from the fit to the MRS data. We find 

a = 3.35(39) lCT 4 . 
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We obtain for the fit parameters: 

p = 11.6(1.6) (23) 
q = 2.238(68). (24) 

The power law ansatz is well confirmed by the fit quality with x 2 cr d o f = 0.795. 

We also took the pessimistic ansatz and tried to detect a potentially exponential increase of 
T mp M °- The exponential fit gives xLrdo.t = 0.975. As can be seen in Fig. |6[ the exponential 
contribution remains suppressed in the extrapolation. A potentially dominating exponential be- 
haviour for MHMC can only be detected in future MHMC simulations on larger lattices. In other 
words, parallel MHMC is capable to overcome SCSD in compact QED in practical simulations, at 
least up to lattices sizes « 24 4 . 



4 Cost Estimates for a FSS Study. 

Finally, we try to assess the compute effort required to perform a FSS study on a series of lattices 
ranging up to 24 4 . 

r flip being readily accessible, we relate this quantity to the effective integrated autocorrelation 
time, T cff , defined in Ref. Q by 

ff LcA =(r L^- ( 25 ) 

* " ts 

°muca i s the squared error of the observable O computed from the multicanonical ensemble, 
see eq. (^). of an denotes the canonical variance of O (computed from the reweighted canonical 
ensemble) and N ts is the length of the multicanonical time series. We can determine cr^, UCA in a 
numerically quite stable way from jackknife blocking. From the time series of <S* on the 14 4 and a 
16 4 MUCA system, with about 550000 and 300000 entries, respectively, we have determined T e ff 
to be 

r 14 = 35(10) 10- 3 x r 14 , 

7-i 6 = 41(12) 10- 3 x t™. (26) 

As a result, we found r tlip w 0.038(8) x T cff []. From hereon we can estimate the number of de- 
correlated subsamples (independent measurements) out of a time series of length N ts to be roughly 



Nts_ = 

' indop 2r e « 79(16) 10- 3 r flip 



N iadep — — ^ n , 1 ^ 1 t ;_,_ ■ (27) 



Assuming the inverse square-root of JV indop to be an upper bound to the relative error r of an 
observable O, we arrive at 

with K being the required number of flips to achieve a relative error < r. 

We thus conclude that 0(100) flip-flops might allow to determine quantities like the specific 
heat and the Binder-Landau cumulant with a relative error of 3 %. 

Obviously, the costs of MHMC and MRS simulation increase with the volume of the lattice, V — 
L 4 . Additionally, for MHMC, we want to keep the average acceptance probability of the leapfrog 
scheme constant. To this end, we have to lower the step size according to At ~ V^ 1 ^ 4 . In a detailed 
tuning investigation we have confirmed that the scaling rule of constant acceptance probability 
f37f leads to optimal performance. From a spectral analysis of the molecular dynamics we can 

6 Note that T flip strongly depends on the difference between Ig and I\ in Fig. |^. It remains to be confirmed that 
the ratio between r e a and Tfii p does not vary too much going to larger lattices. 
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find an optimized trajectory length, N m d, (in the sense that the average acceptance probability 
is maximized) fulfilling At N mc i = const (/3), with only a slight dependence on (3 near the phase 
transition. We choose step-size At according to 

(F acc ) = cy{c{c{P)V^M 2 ^ = const., (29) 

adjusted such that the product T int N m d is minimized finally. In our case, the optimal acceptance 
probability is 65 % @. 

Fig. confirms the scaling of MHMC: the ratio of measured CPU-times for a sweep, *^^ g A 
increases quite linear with L = U 1 / 4 . Minor deviations from this behaviour stem from the use of 
suboptimal run parameters At and N md . 




6 8 10 12 14 16 18 20 22 24 

L 



Figure 7: Ratio of CPU-times per sweep *™^ A and linear fit. Errors are not visible on this scale. 

The product T nip x t MVCA reflects the efficiency of MHMC. Tab. || lists the effective gain factors 
achieved taking MHMC instead of MRS. The two columns refer to the power law and exponential 
extrapolations for MHMC. 



L 


power 


exponential 


16 


1.9(3) 


1.9(3) 


18 


5.5(1.5) 


5.1(1.6) 


20 


21.0(8.7) 


18.4(9.0) 


22 


112(67) 


92(65) 


24 


856(700) 


652(638) 



Table 3: Gain factor for MHMC over MRS as function of the linear lattice extension. The 
prediction for t™ mc is based on a power law ansatz (row 1) and an exponential ansatz (row 2). 

Let us finally translate these factors into real costs: in Fig. ||, we extrapolate the sustained 
CPU time in Tflop-hours required to generate one flip. We conclude, that the integrated CPU time 
to generate the required 100 flips with MHMC on a 24 4 lattice amounts to about 3 Tflop-hours 
sustained CPU time. 
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100 




Figure 8: Sustained CPU time in Tflop-hours to generate one flip. 



5 Summary and Outlook 

We have demonstrated that the fully parallel MHMC algorithm is a very effective tool which is 
able to overcome SCSD as present in the pronounced metastabilities of 4-dimensional U(l) gauge 
theory. A FSS study up to a lattice size of L — 24 with about 100 flip events for each lattice is 
feasible within half a year runtime, given a sustained performance of about 2 Gflops, due to the 
improvements achieved by MHMC. These performance figures should be obtainable on a 32 node 
partition of a Cray T3E-600. 

Less well known is the influence of the delicate part of MHMC, i.e. the determination of 
a suitable estimate for V muCA , which is carried out in an iterative manner. So far, we have 
encouraging experiences on the 16 4 lattice. The success of the crank-up procedures described 
in Ref. J43| gives us hope that the P M uca determination will carry through with only marginal 
deterioration of the improvement factors estimated here. 

The investigations presented form part of an ongoing study that aims at a conclusive FSS 
analysis of compact QED on the Wilson line |5l[] . 
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